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@> Dispatch of instructions to multiple execution units. 



(57) A multiple execution unit processing system 
is provided wherein each execution unit 17, 19 
has an associated instruction buffer 2, 4 and all 
instruction are executed in order. The first 
execution unit (unit 0) will always contain the 
oldest instruction and the second unit (unit 1) 
the newest. Processor instructions, such as 
load, store, add and the like are provided to 
each of the instruction buffers (0,1) from an 
instruction cache buffer. The first instruction 
(oldest) is placed in buffer 0 and the next 
(second) instruction is stored in buffer 1. It is 
determined during the decode stage whether 
the instructions are dependent on an im- 
mediately preceding instruction. If both instruc- 
tions are independent of other instructions, 
then they can execute in parallel. However, if 
the second instruction is dependent on the first, 
then (subsequent to the first instruction being 
executed) it is laterally shifted to the first in- 
struction buffer. Instructions are also defined as 
being dependent on an unavailable resource. In 
most cases these "unavailable" instructions 
are allowed to be executed in parallel on the 
execution units. 
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The present invention relates generally to the dis- 
patch of instructions in a processing system having 
multiple execution units. 

It is currently known to have processing systems 
with multiple execution units therein. A great majority 
of the conventional systems include multiple special 
purpose execution units for performing such opera- 
tions as add, store, load, subtract, multiply, branch, 
and the like. In order to solve the problems associated 
with instruction dependencies, conventional systems 
place the specific instructions in a buffer associated 
with the corresponding special purpose execution 
unit For example, a load operation will be placed in 
a buffer associated with the load unit, and so on. If in- 
struction dependencies are present conventional 
systems merely hold the later instruction, which de- 
pends on the outcome of a previous instruction. Once 
the previous instruction has executed the later de- 
pendent instruction is allowed to execute on the spe- 
cific execution unit. It should be noted that in conven- 
tional systems shifting of instructions may occur be- 
tween various buffers associated with various execu- 
tion units, but this shifting depends on the relation- 
ship between the type of instruction and the particu- 
lar execution unit which can execute that type of in- 
struction. That is, a load instruction may be shifted to 
a buffer associated with the load execution unit. Also, 
these prior art multiple execution unit systems gener- 
ally execute instruction in an out of order sequence. 

U.S. patent 5,133,077 shows multiple distinct 
execution, each having responsibility for specific 
types of instructions. Therefore, each instruction 
must be stored in a specific buffer that is associated 
with one of the execution units, based on the type of 
instruction. These instructions cannot be shifted to 
another buffer associated with another execution 
unit because they are specific to a certain type of 
execution unit Thus, when instruction dependencies 
are discovered this system has no alternative but to 
hold the later dependent instruction until the previous 
instruction, which the held instruction is dependent 
upon, has completed execution. 

U.S. 4,837,678 discusses a system having an in- 
struction sequencer including a shifting circuit that 
receives instructions and shifts them based on the 
type of instruction and which execution unit is re- 
quired to execute the instruction (column 11, lines 8- 
30). 

U.S. 4,847,755 is a processing system having a 
plurality of processor elements which analyzes in- 
structions stream and adds intelligence to the in- 
struction stream. For example, the system looks for 
natural concurrences (independent instructions) and 
adds intelligence including a logical processor num- 
ber and an instruction firing time to each instruction 
(column 17, lines 54-60), which essentially reorders 
the instructions. Logical resource drivers then (col- 
umn 18, lines 10-25) deliver each instruction to the 



selected processing element. 

U.S. 5,075,840 discusses a system with multiple 
processors which can execute instructions out of or- 
der. This system includes the capability to delay exe- 

5 cution of a specific type of instruction until it can be 
executed in its appropriate sequential order. 

It can be seen that none of the conventional sys- 
tems provide a general solution to the problems as- 
sociated with executing dependent instructions in a 

10 system having multiple execution units while preserv- 
ing the sequence of all instructions. Many conven- 
tional systems, by executing instructions out of order, 
require a sophisticated branching mechanism which 
adds a great deal of complexity to the processing sys- 

15 tern. 

Viewed from a first aspect the present invention 
provides a computer processing system including a 
first and second execution unit, comprising: first and 
second instruction buffers associated with said first 

20 and second execution units for providing instructions 
thereto; means for interpreting whether said instruc- 
tions in said first and second instruction buffers are 
independent instructions or data dependent instruc- 
tions; and means for providing said data dependent 

25 instructions to said first execution unit, such that in- 
dependent instructions can be concurrently provided 
to said second execution unit via the second instruc- 
tion buffer. 

Viewed from a second aspect the present inven- 

30 tion provides a method of operating a computer proc- 
essing system including a first and second execution 
unit, the method comprising the steps of: employing 
first and second instruction buffers associated with 
said first and second execution units to provide in- 

35 structions thereto; interpreting whether said instruc- 
tions in said first and second instruction buffers are 
independent instructions or data dependent instruc- 
tions; and providing said data dependent instructions 
to said first execution unit, such that independent in- 

40 structions can be concurrently provided to said sec- 
ond execution unit via the second instruction buffer. 

The present invention provides a system wherein 
instructions are analyzed during the decode stage 
and it is then determined whether these instructions 

45 are considered independent, dependent or "unavail- 
able". For the purposes of the present invention de- 
pendent instructions are defined as only those in- 
structions which are dependent on the immediately 
preceding instruction. Instructions which depend on 

so the availability of a value in a resource, e.g. a register, 
are defined as "unavailable" and, in some cases, may 
be treated by the present invention as independent 
and shifted in parallel (but not out of order). Of course, 
totally independent instruction are executed in paral- 

55 lei. 

Broadly, a dual execution unit processing system 
is provided wherein each execution unit has an asso- 
ciated instruction buffer. The first execution unit (unit 
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0) will always contain the oldest instruction and the 
second unit (unit 1) the newest. Processor instruc- 
tions, such as load, store, add and the like are provid- 
ed to each of the instruction buffers (0,1) from an in- 
struction cache buffer. The first instruction (oldest) is 5 
placed in buffer 0 and the next (second) instruction is 
stored in buffer 1. It is determined during the decode 
% stage whether the instructions are dependent, and 

must be executed one at a time, or if the instructions 
are independent or "unavailable" and can execute in 10 
parallel. If the second instruction is dependent on the 
first, then (subsequent to the first instruction being 
executed) the second is laterally shifted from buffer 
1 to the first instruction buffer 0. This shift is due en- 
tirely to the dependency of the second instruction on 15 
the first instruction. The shifted instruction then be- 
comes the first instruction and a new "second" in- 
struction is received in the second instruction buffer. 
It should be noted that all instructions will execute in 
sequential order. If a large number of sequential de- 20 
pendent instructions are encountered, then they will 
be executed in order by execution unit 0 only. That is, 
the first two instructions will be determined to be de- 
pendent and placed in both instruction buffers. Since, 
the instruction in the second buffer is dependent on 25 
the first, then subsequent to execution of the first in- 
struction, it will shift to the first instruction buffer. The 
next instruction will then be placed in the second in- 
struction buffer 1 and if it is also dependent it will then 
be shifted to the first buffer after the previous instruc- 30 
tion has been executed, and so on. If the next instruc- 
tions are independent, then both will execute in par- 
allel on the first and second execution units. In this 
manner, processor instructions are efficiently execut- 
ed, in order based on their dependencies with other 35 
instructions, not the capabilities of the execution 
units. 

From the above description it can be seen that 
the present invention provides a system having mul- 
tiple instruction buffers wherein instructions are de- 40 
fined as dependent, independent or dependent on an 
unavailable resource. These instructions are execut- 
ed sequentially or in parallel depending upon their 
definition and relationship to other instructions. 

By preserving the order of the instructions the 45 
present invention optimizes performance by reducing 
overhead, e.g. determining when a branch will occur. 

♦ A processor according to the invention can operate 
with many types of computer systems wherein in- 
structions are analyzed and executed based only on 50 

* their dependencies with other instructions and not the 
capabilities of the execution units. 

The present invention will now be described fur- 
ther, by way of example only, with reference to a pre- 
ferred embodiment thereof as illustrated in the ac- 55 
companying drawings, in which: 

Figure 1 is a block diagram showing various com- 
ponents of a multiple execution processor capa- 



ble of utilizing the present inv ntion; 

Figure 2 is a more detailed diagram of a preferred 

embodiment of the present invention illustrating 

the instruction buffers which are associated with 

each execution unit and the instruction flow 

paths; 

Figures 3a and 3b are flowcharts illustrating a 
preferred embodiment of the present invention 
wherein the first and second instructions are 
shown as being dependent, based on various cri- 
teria; and 

Figure 4 is a timing diagram showing examples of 
the number of machine cycles required by the 
preferred embodiment of the present invention to 
execute independent and dependent instruc- 
tions. 

Referring to Figure 1, a high level block diagram 
is shown of the various components in a multiple exe- 
cution processor according to the preferred embodi- 
ment of the present invention. Reference numeral 1 
denotes an instruction buffer which receives instruc- 
tions from instruction buses A, B, C, D. The instruc- 
tion buffer 1 is a storage device, such as hardware 
memory, as is known in the art. The register file input 
latches 10 receive data from a dual port data cache 
unit (not shown) which is connected to system mem- 
ory (not shown). It should be noted that a preferred 
embodiment of the present invention is a processor 
having two execution units, although processing sys- 
tems having greater than two execution units are con- 
templated. Decode control units 3 and 5 (correspond- 
ing to execution units 0 and 1 , respectively) are used 
to interpret instructions received from instruction buf- 
fer 1. The decode units 3, 5 are able to recognize the 
instructions as load, store, add, or the like. Each de- 
code unit 3, 5 has a corresponding register file 7, 9, 
respectively, which receives data from the input latch- 
es 10, either directly from the data cache bus 36, or 
through write back data line 35. Decode units 3 and 
5 provide the read/write control signals to the register 
files 7 and 9, based on the instructions received from 
buffer 1. These control signals determine which data 
will be written from the input latches 10 to register 
files 7, 9, and read from registerf iles 7, 9 to execution 
input latches 11. It can be seen from Figure 1 that reg- 
ister file 7 allows four data words to be written from 
the input register and three data words to be read 
therefrom by execution latches 11. Register file 9 al- 
lows four data words to written from input register file 
1 0 and four words read therefrom by execution unit 
latches 11. 

It should be noted that the preferred embodiment 
of the present invention is implemented in boolean 
logic which is contained within decode control units 3 
and 5. This logic implements the process of interpret- 
ing the instructions and determines whether one or 
two instructions may be shifted into execution units 
17 and 19. The logic flow of the preferred embodi- 
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ment of the present invention, illustrated in the flow 
charts of Figures 3a and 3b, may be readily trans- 
formed into the hardware logic implemented by the 
decode units 3 and 5 of the present invention by one 
of average skill in the art. 

Those skilled in the art will also understand that 
the present invention is a processor with pipelined ar- 
chitecture. This means that data is latched, or held, at 
various stages. In this way, the results of executed in- 
structions, or the like, are saved such that different 
instructions can be concurrently input to the process- 
ing system. Thus, a continuous flow of data through 
the processor is possible. Execution unit input latches 
hold data provided by register files 7 and 9 prior to the 
data being input to the actual execution units. Multi- 
plexers (not shown) are also included in execution 
unit input latches 11 , which can combine the data sig- 
nals received from the data cache bus 36 such that 
the data can be bypass to the execution units, it 
should be noted that register file input latches 10 also 
includes multiplexers in order to allow the combina- 
tion of data from the data cache unit bus 36 and write 
back bus 35. 

Execution units 17 and 19 are both full function 
units capable of executing a plurality of instructions. 
These units may be identical, however this limitation 
is not required by the present invention. In a preferred 
embodiment, units 1 7 and 1 9 are not identical, but are 
two full function execution units, one having slightly 
different features to handle the specific requirements 
of a few specialized instructions. Execution unit 17 in- 
cludes a dual port adder 21 that performs add func- 
tions, and a logic unit 23 which is used to execute in- 
structions such as rotate, or the like. A three port ad- 
der 25 is included in execution unit 19, along with logic 
unit 27. The three port adder 25 provides a major sav- 
ing in the number of cycles required to execute an add 
type instruction which includes load, store, compare 
and the like. This is due to the ability of the preferred 
embodiment of the invention to treat all add type in- 
structions as independent instructions, when deter- 
mining which instructions can be executed in parallel. 
In prior art systems add type instructions are consid- 
ered dependent, i.e., the second instruction must wait 
until completion of the previous instruction. For ex- 
ample, the instructions (1) ADD R1, R2, R3 and (2) 
ADD R5, R1, R4 are typically used to add the values 
in registers files R2 and R3 and to put the new value 
in registerfile Rl.The next instruction must wait until 
the value is placed in registerfile R1 because this val- 
ue is added to a value registerfile R4 to obtain the de- 
sired resulting value which is then placed in register 
file R5. In can be seen that the desired result is ac- 
tually the sum of the values in register files R2, R3 
and R4. Thus, to perform this operation conventional 
systems require execution of two instructions over 
two machine cycles. 

In contrast, the present invention is capable of 



performing the same op ration in a single machine 
cycle by providing a mechanism that allows ADD in- 
structions to be defined as independent, and there- 
fore, execute in parallel. Using the previous exampl , 

5 when instructions ADD R1, R2, R3 and ADD R5, R1, 
R4 are decoded by decode control units 3 and 5, they 
are interpreted to mean Add R1 , R2, R3 and Add R5, 
R2, R3, R4, i.e. the dependent instructions are col- 
lapsed into independent instructions wherein the val- 

10 ues in R2 and R3 are substituted for the value in Rl 
in the second instruction. The first add instruction, 
ADD R1, R2, R3, must still be executed, because the 
value placed in registerfile R1 may be required by in- 
structions other than the following add instruction 

15 (ADD R5, R1, R4). Further, due to the existence of 
three port adder 25, both of these instructions can be 
executed simultaneously. That is, ADD R1, R2, R3 is 
executed on two port adder 21 and ADD R5, R2, R3, 
R4 is concurrently executed on three port adder 25. 

20 In this manner, the preferred embodiment of the pres- 
ent invention is capable of defining add type instruc- 
tions as being independent of any other instruction. 

Execution control units 1 3 and 1 5 are also provid- 
ed which receive decoded instructions from decode 

25 units 3, 5 and input the instructions, in the form of 
control signals to execution units 17 and 19, respec- 
tively. During execution of load and store instructions 
by units 1 7 and 19, an effective address is calculated 
by adders 21 and 25 for the data being manipulated 

30 (i.e. where in the cache the referenced data is locat- 
ed), as well as calculating the data values themselves 
for store instructions. The effective address is then 
transferred to the dual port data address translation 
logic unit 33, which translates the previously calcuiat- 

35 ed effective address from the execution units to a 
physical address (i.e. where in the memory the refer- 
enced data is located). Data is returned from the data 
cache on bus 36 and is input to registerfile input latch- 
es 1 0 and may be bypassed to execution input latches 

40 1 1 (via data cache bus 36). For other types of instruc- 
tions which manipulate data in the processor, such as 
add instructions, the data values from execution 
units 17, 19 are input to registerfile input latches 10 
and may be bypassed to the execution input latches 

45 11 (via write back bus 35). 

Figure 2 is a more detailed view of instruction 
buffer 1, shown in Figure 1. Instruction buses A, B, C 
and D input instructions from an instruction cache 
unit, or the like, to a bus 8, which transmits the in- 

50 structions directly to the actual instruction buffers 14, 
which are f irst-in, first-out hardware storage devices. 
Output instruction buffers 2 and 4 receive the instruc- 
tions, either directly from bus 8, or from buffer 14, via 
bus 12. Instruction buffers 2 and 4 each correspond 

55 to one of the execution units 17, 19, respectively. Fur- 
ther, a bus 6 is provided that allows instructions to be 
shifted, or transferred between output buffers 2 and 
4. Since, the preferred embodiment of the present in- 
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vention includes two full function processing units, in- 
structions can be shifted between output buffers 2 
and 4 for execution by either unit 17, 19. The signifi- 
cance of this capability will be more fully described 
below in conjunction with Figures 3a and 3b. The in- 
structions to be executed are then output from buf- 
fers 2 and 4 to decode control units 3 and 5. 

Figures 3a and 3b show the relationship between 
the instructions defined by the preferred embodiment 
of the present invention. One category is defined as 
instructions which are dependent on previous in- 
structions, other than the immediately prior instruc- 
tion. That is, a value loaded from memory by a LOAD 
instruction may be utilized by subsequent instruc- 
tions that are executed several machine cycles later. 
For example, assume the following instructions are 
encountered: 

(1) LOAD R1, R31, R30 

(2) ADD R4, R3, R2 

(3) ADD R5, R1, R2 

The first instruction (1) loads a value in register 
file R1. An effective address in memory is computed 
by adding R31 and R30. The data at the "computed" 
address is loaded into R1 . Next, instruction (2) adds 
the contents of register files R3 and R2 and places 
the resulting value in register file R4. Instruction (3) 
then uses the value previously loaded in register file 
R1 and adds this value to the value in register file R2 
and puts the result in register file R5. It can be seen 
that instruction (3) is dependent on a value in a re- 
source (register file R1) that may be unavailable, i.e. 
the value to be loaded in register file R1 may have to 
come from system memory which may take many ma- 
chine cycles. 

The present invention defines those instructions 
which are dependent on a previous instruction, other 
than the immediately preceding instruction, as "un- 
available" for being dependent on a potentially un- 
available resource (register file). These "unavailable" 
instructions are allowed to move in parallel into exe- 
cution units 17 and 19 as if they were independent 
However, as will be discussed in greater detail below 
there are other conditions, such as when an "unavail- 
able" instruction precedes a dependent instruction, 
when these instructions must execute sequentially. 
These "unavailable" instructions, along with depend- 
ent instructions are considered to be data dependent 
instructions, i.e. instructions that have some type of 
data dependency, whether it is based on the imme- 
diately preceding instruction, or another previous in- 
struction. 

Therefore, in accordance with the present inven- 
tion, all instructions are classified as either: (1) inde- 
pendent; (2) dependent: or (3) unavailable. The pres- 
ent invention provides a mechanism which determi- 
nes what combinations of these instructions can exe- 
cute simultaneously on execution units 17 and 19, 
and which of the instructions must execute sequen- 



tially. Instructions are provided to instruction units 
buffers 2 and 4 in pairs with the first instruction being 
provided to instruction buffer 0 (reference numeral 2) 
and the second instruction to buffer 1 (reference 

5 number 4). In this manner all instructions are either 
executed one at a time in sequential order, or xecut- 
ed in parallel in which the original order is also main- 
tained. The sequential order is preserved because 
the oldest instruction is always placed in instruction 

10 buffer 2 and considered to be executed prior to the in- 
struction in buffer 4. As previously noted, dependent 
instructions are defined as only those instructions 
which are dependent on the immediately previous in- 
struction, with the exception of add instructions using 

15 the three port adder, which are considered indepen- 
dent for the purposes of the preferred embodiment of 
the present invention. Independent instructions are 
defined as those which do not require any results, val- 
ues, data, resources, or the like from, or utilized by 

20 any previous instruction. "Unavailable" instructions 
are those defined above as being dependent on an in- 
struction other than the immediately preceding in- 
struction. 

As noted above, Figures 3a and 3b are flow 

25 charts describing the execution of the previously de- 
fined instructions in accordance with the preferred 
embodiment of the present invention to most effi- 
ciently utilize the dual execution units 17 and 19. 
First, a general description of the process will be pre- 

30 sented, followed by a specific example, using com- 
mon instructions, defined as set forth above. Further, 
the following description will cover the case where 
two execution units are present, however, it should be 
understood that processing systems having different 

35 numbers of execution units are contemplated by the 
present invention. 

At step 1 , the process is started and step 2 deter- 
mines if both execution units 17 and 19 are available. 
If not, then the system is held at step 3 until both units 

40 are available for use. Once the units are available, 
then step 4 determines if the first and second instruc- 
tions in the instruction stream (from instruction buf- 
fers 2 and 4 of Figure 2) are both independent or both 
dependent instructions. If both the first and second 

45 instructions are independent, then both instructions 
are shifted into the machine, to be decoded by units 
3 and 5, and simultaneously execute on execution 
units 17 and 19 (step 8). If, the first and second in- 
structions are not both independent instructions, 

so then the method proceeds to step 5, wherein it is de- 
termined if the first and second instructions are both 
to be held off for some reason. This occurs when the 
contents of a register are needed, but not yet avail- 
able. This type of instruction is not dependent on the 

55 outcome of the immediately previous instruction, but 
some instruction or resource defined as "unavail- 
able". If both the first and second instructions are un- 
available, then both instructions are shifted at step 8, 
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since, regardless of the location of the instruction (in 
instruction buffer or decode), the system will have to 
wait for the "unavailable" resources needed by these 
instructions to become accessible. Therefore, "un- 
available" instructions can be shifted to the execution 
units 17 and 19 allowing new instructions to be 
moved into buffer 2 and 4. 

If the first instruction is dependent on an unavail- 
able register file entry and the second is independent 
(step 6) then both of these instructions are shifted to 
the execution units at step 8. In this situation, the 
present invention allows two instructions to be shift- 
ed, because the "unavailable" instruction is moved 
into the execution unit 17 to wait for access of the re- 
quired data, and the independent instruction can wait 
in execution unit 14 until the "unavailable" instruction 
is ready and then both instruction will be executed in 
parallel. However, if the first instruction is not "un- 
available", or the second instruction is not indepen- 
dent, then the instruction execution method of the 
present invention proceeds to step 7 which determi- 
nes if the first instruction is independent and the sec- 
ond instruction is "unavailable". In the situation of 
step 7, only a single instruction is shifted because it 
is necessary to move the unavailable instruction over 
to instruction buffer 2, because if the instruction fol- 
lowing the unavailable instruction is either indepen- 
dent, or unavailable, then two instructions can be 
shifted during the next machine cycle. Also, because 
the present invention executes all instructions in or- 
der, it is advantageous to have instruction buffer 4 
ready to receive the next instruction. That is, if the 
system has to wait until the unavailable instruction 
executes, it is most efficient to have the instruction 
waiting in buffer 2 where it will be the next instruction 
to execute. In this way, a subsequent instruction can 
be moved into instruction buffer 4, thereby avoiding 
a bubble in the pipeline. A cycle would be wasted if the 
machine had to laterally shift the unavailable instruc- 
tion over to buffer 2, after the resource became avail- 
able. Thus, a continuous flow of instructions is en- 
sured since the next subsequent instruction can be 
placed in buffer 4 after the shift of the unavailable in- 
struction from buffer 4 to buffer 2. 

If the conditions of step 7 are met, then the meth- 
od continues to step 11 where a single instruction is 
shifted to the machine for decoding and execution. It 
should be noted that shifting one instruction is de- 
fined herein to mean shifting the instruction in in- 
struction buffer to execution unit 17 and shifting the 
instruction in instruction buffer 4 to instruction buffer 
2. Shifting two instructions means simultaneously 
shifting two instructions from buffers 2 and 4 into 
execution units 17 and 19. How ver, if the first in- 
struction is not independent and the second instruc- 
tion is not "unavailable" then st p 9 determines if the 
first instruction is independent and the second in- 
struction is d pendent. If so, the process continues to 



step 11 where the independent instruction is shifted 
to the machine (from buffer 2 to execution unit 17) 
and executed, and the dependent instruction is shift- 
ed from instruction buffer 4 to instruction buffer 2. It 

5 should be noted that the dependent instruction is now 
dependent on an instruction that has already execut- 
ed and no longer considered to be dependent. Sub- 
sequent to step 11 , the process will loop back to step 
2 and another instruction (second) will be provided by 

w bus 8, or buffer 14 and these two instructions will be 
considered according to the criteria of steps 2, 4-7,9 
and 10. 

If, at step 9, it was determined that the first in- 
struction is not independent and the second is not de- 

15 pendent, then step 10 determines if the first instruc- 
tion is unavailable and the second instruction is de- 
pendent. If so, the method proceeds to step 11 where- 
in the first instruction is provided to the machine and 
the second is shifted to the first instruction output 

20 buffer (shift 1). It can be seen how the determination 
blocks (steps 4-10) of the preferred embodiment of 
the present invention address each type of defined in- 
struction to ensure that the maximum number of in- 
structions will be executed in parallel, while also be- 

25 ing executed sequentially. 

Next, a typical instruction sequence will be used 
as an example of the operation of the present inven- 
tion. Assume, that the following instruction are re- 
ceived in buffer 1 from buses A, B, C, D: 

30 (1) LOAD R1, R31, R30 

(2) LOAD R2. R29, R30 

(3) ADD R22, R20, R21 

(4) ADD R24, R22, R23 

(5) ADD R3, R1, R2 
35 (6) LOAD R6, R2, R22 

(7) LOAD R25, R28, R29 

(8) ADD R8, R6, R7 

(9) LOAD R9, R27, R28 

(10) LOAD R10, R31, R30 
40 (11)ADDR11,R9, R3 

(12) OR R11, R11 R20 

(13) ADDR13, R11, R12 

The first two instructions (1) and (2) are placed 
in buffers 2 and 4. These instructions load values in 

45 register files R1 and R2, based on addresses calcu- 
lated from values in register files R31, R30 and R29, 
R30, respectively. Instructions (1) and (2) are inde- 
pendent, since they do not depend on any other in- 
structions, and both are shifted to execution units 17 

50 and 19, simultaneously (shift 2), in accordance with 
step 4 of Figure 3a. Instructions (3) and (4) are then 
placed in buffers 2 and 4, respectively. Instruction (3) 
is an add instruction which sums the values of two 
register files R20, R21 and places the result in regis- 

55 ter file R22. Instruction (4) is another add operation 
which uses the result of the pr ceding add instruction 
(3), i.e. the value in R22 is added to the value in R23 
and the result placed in register fil R24. Instructions 
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(3) and (4) have added the values in register fil s 
R20, R21 and R23. Due to the presence of three port 
adder 25 in the preferred embodiment of the present 
invention, these instructions are considered to be in- 
dependent and are shifted to execution units 17 and s 
19 for concurrent execution (step 4, Figure 3a). 

Instructions (5) and (6) are then moved into buf- 
fers 2 and 4, respectively. Instruction (5) is an add in- 
struction which is dependent on instructions (1) and 
(2) neither of which is the previous instruction, there- 10 
fore instruction (5) is considered "unavailable". In- 
struction (6) is a load instruction whose effective ad- 
dress is dependent on values in register files R2 and 
R22. These values are not affected by previous in- 
struction (5). However, R2 is dependent on instruc- 15 
tion (load R2) and therefore, instruction (6) is also de- 
fined as "unavailable". In accordance with the present 
invention (step 5 of Figure 3a), two "unavailable" in- 
structions are shifted to execution units 17 and 19, 
where they will wait for access to the unavailable 20 
data. This allows the next two instructions to be 
moved into buffers 2 and 4. 

Instruction (7) is a load instruction which loads a 
value from memory to a register file R25 in the proc- 
essor. Instruction (7) is independent from any other 25 
instructions or registers. Instruction (8) is an add in- 
struction that adds values in register files R6, R7 and 
puts the sum in R8. This instruction (8) is "unavail- 
able", since it depends on instruction (6). In this case, 
where the first instruction is independent and the sec- 30 
ond is unavailable, only instruction (7) in buffer 2 is 
shifted (to execution unit 17) while instruction (8) is 
laterally moved from buffer 4 to buffer 2 (step 7, Fig- 
ure 3a). The next instruction (9) is then placed in buf- 
fer 4 and instructions (8) and (9) are considered. In- 35 
struction (9) is a load instruction that puts a value in 
from memory into register file R9 in the processor, 
and is considered to be independent. Therefore, an 
unavailable instruction (8) is in buffer 2 and an inde- 
pendent instruction (9) is ih buffer 4. In this case both 40 
instructions (8) and (9) will be shifted to the execution 
units 17 and 19, simultaneously, as soon as the pre- 
vious instructions have executed (step 6, Figure 3a). 

The next two instructions (10) and (11) are then 
provided to buffers 2 and 4, respectively. Instruction 45 
(10) is an independent instruction wherein the con- 
tents of a location in memory are loaded into register 
file R10 in the processing system. Instruction (11) is 
"unavailable" since it depends on a value in register 
file R9 which was determined during a previous in- so 
9 struction (other than the immediately preceding in- 

struction). In the case where the first instruction (10) 
in independent and the second instruction is unavail- 
able (11), the first instruction (10) will be shifted to 
execution unit 17 from buffer 2, and the second in- 55 
struction (11) will be shifted from buffer 4 to buffer 2 
(step 7, Figure 3a). 

Instruction (12), is an OR instruction which per- 



forms a logical "or" operation on the contents of two 
registerf iles, in this case R11 and R20 and places the 
result in register file R11. It can be seen that instruc- 
tion (12) is dependent on the previous instruction 
(11), which determines the value in register file R11. 
Thus, there is an unavailable instruction (11) in buffer 
2 and a dependent instruction (12) in buffer 4. In this 
case, a single instruction (11) is shifted to execution 
unit 17 (step 10, Figure 3b). Instruction (12) cannot be 
shifted to execution 1 9, because it needs the value in 
R11 . Instruction (12) is then shifted to buffer 2 to be 
executed subsequent to instruction (11). Instruction 
(13), which adds the contents of register files R11 and 
R12 and places the sum in register file R13 is then 
moved to buffer 4. Instruction (12) is now considered 
to be independent since the instruction (11) on which 
it depends has executed. Instruction (13) is depend- 
ent since it is dependent on the immediately preced- 
ing instruction (12). Thus, there is an independent in- 
struction (12) in buffer 2 and a dependent instruction 
(13) in buffer 4. In this instance, independent instruc- 
tion (1 2) is moved to execution unit 1 7 and dependent 
instruction (13) is shifted laterally to buffer 2 (step 9, 
Figure 3b). 

Figure 4 shows two timing diagrams A and B that 
provide a comparison in the number of machine cy- 
cles that are required to execute a load and add in- 
struction when the instructions are independent (di- 
agram A) and when the instructions are dependent 
(diagram B). 

With regard to diagram A, at cycle 1 , the load and 
add instructions are stored in instruction buffers 2 
and 4, respectively and since both instructions are in- 
dependent (step 4, Figure 3a), then they are both 
shifted to execution units 13 and 15 for execution dur- 
ing cycle 2. At the end of cycle 2, the data from the 
add instruction is latched into register file input latch- 
es 10. The data for the load instruction is then ac- 
cessed from the cache, during cycle 3 and latched 
into registerf ile input latch 10. Also during cycle 3, the 
control signal for the add instruction causes a write 
back to the registerf iles 7, 9. At cycle 4, the data from 
the cache that was in input register 10 is written to 
register 11 in register files 7, 9. 

Diagram B shows the same instructions in the in- 
struction buffers (also in the 0 and 1 ) positions at cy- 
cle 1, however, in this case the add instruction is de- 
pendent on the load instruction, e.g. LOAD R1 and 
ADD R1 , R2, R3. In this case only the load instruction 
is shifted to the execution control unit, during cycle 2 
(in accordance with step 9 of Figure 3b). Also during 
cycle 2, the add instruction is shifted from instruction 
buffer 4 to buffer 2, effectively shifting this instruction 
for execution by processor 17, rather than 19. It 
should be noted, that in this case the load and add in- 
structions will be executed in sequence on processor 
17. Of course, additional instructions from bus 8, or 
buffer 14, will be provided to instruction buffer 4 dur- 
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ing cyci s 2-5 and analyzed in accordance with the 
process of Figures 3a and 3b. Howev r, for the sake 
of simplicity, these additional instructions have not 
b en shown in diagrams A and B. During cycle 3, the 
load instruction causes the cache to be accessed and 
the requested data loaded into input register 10 and 
bypassed into execution unit input latch 11. Also, the 
add instruction is moved to execution control 13, 
where it stalls because data is unavailable. The add 
instruction is then executed during cycle 4, and the 
register files 7, 9 are written with the load data. Final- 
ly, the add instruction results are written back to the 
register files 7, 9 during cycle 5. 

It can be seen that the same independent instruc- 
tions can be executed in fewer cycles than identical 
dependent instruction. Diagrams A and B clearly 
show how independent instructions can be executed 
in parallel, on different execution units, while the de- 
pendent instruction must be executed serially. The 
add instruction of Diagram A executed prior to the 
load instruction, because of its independence. How- 
ever, in Diagram B the add executes after the load in- 
struction since it is dependent thereon. 

Those skilled in the art will understand that the 
present invention increases processor performance 
by executing more instructions per cycle than con- 
ventional systems. For example, if instructions de- 
fined as independent, unavailable, independent (11, 
U, 12) are provided to a conventional dual execution 
unit system, 11 and U are provided to units O and 1. 
The processing system executes 11 and then (n cy- 
cles later) executes the U instruction when the re- 
source becomes available. Thus, in a conventional 
system 2 instructions are executed in n cycles. How- 
ever, the present invention, after 11 executes, moves 
the U instruction to execution unit 0 and shifts 12 to 
execution unit 1. The processing system then exe- 
cutes both U and 12 (after the n cycles) when the re- 
source becomes available. Of course, 12 can execute 
in parallel with U since it is independent. Thus, the 
present invention allows 3 instructions to be executed 
in n cycles. Other advantages are also present when 
an unavailable instruction U2 follows U in the previ- 
ous example. In this case, there is a high probability 
that both U and U2 would be allowed to execute in 
parallel, since sequential unavailable instructions are 
often dependent on the same resource and once it is 
available (after n cycles), then both can execute. 
Therefore, the present invention again executes 3 in- 
structions in n cycles, whereas conventional systems 
would only execute 2 since U2 would wait on U to exe- 
cute. 



Claims 

1. A computer processing system including a first 
(17) and second (19) execution unit, comprising: 



first (2) and second (4) instruction buffers 
associated with said first (17) and second (19) 
execution units for providing instructions thereto; 

means for interpreting whether said in- 
5 structions in said first and second instruction 

buffers are independent instructions or data de- 
pendent instructions; and 

means for providing said data dependent 
instructions to said first execution unit (17), such 
10 that independent instructions can be concurrent- 

ly provided to said second execution unit (19) via 
the second instruction buffer (4). 

2. A system according to claim 1 wherein said 
15 means for providing comprises shift means for 

moving instructions from said second buffer to 
said first buffer. 

3. A system according to claim 2 wherein said data 
20 dependent instructions comprise dependent in- 
structions which are dependent on the imme- 
diately preceding instruction, or unavailable in- 
structions which are dependent on an instruction 
other than the immediately preceding instruction 

25 or on the availability of a value in a resource, in 

predetermined cases said unavailable instruc- 
tions being treated as independent instructions. 

4. Asystem according toany of claims 1 to3, where- 
30 in one of the execution units (17, 19) includes a 

three port adder (25) such that instructions used 
to add values stored in three distinct storage lo- 
cations are defined as independent instructions. 

35 5. Asystem according to any of claims 2 to 4, further 
comprising: 

means for providing a single successive 
instruction to said second instruction buffer (4) 
subsequent to an instruction in said second in- 

40 struction buffer being moved by the shift means 

to said first instruction buffer (2), and for provid- 
ing successive instructions to each of said first 
(2) and second (4) instruction buffers when said 
instructions in said first and second instruction 

45 buffers are shifted in parallel to said first (1 7) and 

second (19) execution units. 

6. A method of operating a computer processing 
system including a first (17) and second (1 9) exe- 
50 cution unit, the method comprising the steps of: 

employing first (2) and second (4) instruc- 
tion buffers associated with said first (17) and 
second (19) execution units to provide instruc- 
tions thereto; 

55 interpreting whether said instructions in 

said first and second instruction buffers are inde- 
pendent instructions or data dependent instruc- 
tions; and 
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providing said data dependent instruc- 
tions to said first execution unit (1 7), such that in- 
d pendent instructions can be concurrently pro- 
vided to said second execution unit (19) via the 
second instruction buffer (4). 

7. A method according to claim 6 wherein said step 
of providing comprises the step of moving in- 
structions from said second buffer (4) to said first 
buffer (2). 

8. A method according to claim 7 wherein said data 
dependent instructions comprise dependent in- 
structions which are dependent on the imme- 
diately preceding instruction, or unavailable in- 
structions which are dependent on an instruction 
other than the immediately preceding instruction 
or on the availability of a value in a resource, in 
predetermined cases said unavailable instruc- 
tions being treated as independent instructions. 

9. A method according to any of claims 6 to 8, 
wherein one of the execution units (17, 19) is pro- 
vided with a three port adder (25) such that in- 
structions used to add values stored in three dis- 
tinct storage locations are defined as indepen- 
dent instructions. 

10. A method according any of claims 6 to 9 further 
comprising the step of, when independent in- 
structions are present in both the first and sec- 
ond instruction buffers, shifting said indepen- 
dent instructions in parallel from said first and 
second instruction buffers to said respective first 
and second execution units. 

11. A method according to any of claims 8 to 10, fur- 
ther comprising the step of, when unavailable in- 
structions are present in both the first and sec- 
ond instruction buffers, shifting said unavailable 
instructions in parallel from said first and second 
instruction buffers to said respective first and 
second execution units. 

12. A method according to any of claims 8 to 11 fur- 
ther comprising the steps of, when an unavailable 
instruction is present in the first instruction buffer 
and a dependent instruction in the second in- 
struction buffer: 

shifting said unavailable instruction in 
said first instruction buffer to said first execution 
unit; and 

moving said dependent instruction in said 
second instruction buffer to said first instruction 
buffer. 

13. A method according to any of claims 8 to 12 fur- 
ther comprising the step of, when an unavailable 



instruction is present in the first instruction buffer 
and an independent instruction in the second in- 
struction buffer, shifting in parallel said unavail- 
able instruction in said first instruction buffer and 
5 said independent instruction in said second in- 

struction buffer to said respective first and sec- 
ond execution units. 

14. A method according to any of claims 8 to 13 fur- 
to ther comprising the steps of, when an indepen- 
dent instruction is present in the first instruction 
buffer and an unavailable instruction in the sec- 
ond instruction buffer: 

shifting said independent instruction in 
15 said first instruction buffer to said first execution 

unit; and 

moving said-unavailable instruction in said 
second instruction buffer to said first instruction 
buffer. 

20 

15. A method according to any of claims 8 to 14, fur- 
ther comprising the steps of, when an indepen- 
dent instruction is present in the first instruction 
buffer and a dependent instruction in the second 

25 instruction buffer: 

shifting said independent instruction in 
said first instruction buffer to said first execution 
unit; and 

moving said dependent instruction in said 
30 second instruction buffer to said first instruction 

buffer. 

16. A method according to any of claims 6 to 15 fur- 
ther comprising the steps of: 

35 providing a single successive instruction 

to said second instruction buffer (4) subsequent 
to said instruction in said second instruction buf- 
fer (4) being moved to said first instruction buffer 
(2); and 

40 providing successive instructions to each 

of said first (2) and second (4) instruction buffers 
when said instructions in said first and second in- 
struction buffers are shifted in parallel to said 
first (17) and second (19) execution units. 

45 
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